Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perhaps improve the documentation of drain members further #93769

Closed
wants to merge 2 commits into from

Conversation

ssomers
Copy link
Contributor

@ssomers ssomers commented Feb 8, 2022

Building on #92902, let's try adding clauses about mem::forget in all of the drainy functions.

Reasons to do that:

  • These methods consume (part of) the contents a container, but obviously the argument type does not make that clear and has to be a plain &mut. That makes a huge difference compared to into_iter.
  • Apparently people do use mem::forget a lot and are surprised by its effect, see the history of the notice on vec::drain.

Reasons not to do that:

  • It makes the description overwhelming. To mitigate that, I moved these paragraphs into a separate section.
  • It may lock in a contract that a later implementation may wish to violate. Though as far as I see that's only the case for string::drain.

I'd rather move the Leaking section below Panics because it seems much more of a detail, just didn't do that to reduce changes where the clause exists already.

r? @yaahc

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 8, 2022
@rust-log-analyzer

This comment has been minimized.

library/alloc/src/collections/linked_list.rs Outdated Show resolved Hide resolved
library/alloc/src/collections/binary_heap.rs Outdated Show resolved Hide resolved
library/alloc/src/collections/linked_list.rs Outdated Show resolved Hide resolved
Comment on lines -1218 to +1219
/// Creates a draining iterator that removes the specified range in the
/// `VecDeque` and yields the removed items.
/// Removes the specified range from the `VecDeque`, returning all removed
/// elements as an iterator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind-of prefer the original sentence here. What’s the motivation for the change? The call to drain itself does not do anything (yet) beyond creating an iterator. The returned iterator has a lifetime coupled to the mutable borrow; for an operation that “Removes the specified range … returning all removed elements as an iterator” one could expect an iterator that owns the elements and can be created with a shorter mutable borrow not coupled to a lifetime parameter of the type of the iterator.

The same applies to Vec and String

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the commit for #92902 and should be discussed there or in #92765.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. I didn't realize that there's another unmerged PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I didn't realize anyone would notice this PR this month.

Copy link
Member

@steffahn steffahn Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just happened to look at the list of latest/new open PRs today, (mainly because I have an open PR myself where I'm a bit impatiently checking whether anything has happened w.r.t. review yet,) then happend to be quickly looking into this one for whatever reason and probably wouldn't even have started any review if I hadn't spotted the typo of "closure". 😅 I hope you don't mind my lengthy side-discussion about validity of collection types that I kind-of opened in this thread 🙈

Comment on lines +1636 to +1637
/// In case the iterator disappears without getting dropped (using
/// [`core::mem::forget`], for example), the range remains in the string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t understand why this would need to be specified. It encourages usage of mem::forget which usually shouldn’t be used on foreign types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my description.

Copy link
Contributor Author

@ssomers ssomers Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or clarify what you mean with "this": the entire paragraph, or the constraint about what can happen if you leak an instance of this iterator?

Copy link
Member

@steffahn steffahn Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this paragraph kind-of says to me "If you want to, during iteration, change your mind and keep the range inside of the original string, feel free to go ahead and just mem::forget the iterator. This is a supported use-case and we guarantee the precise behavior in this case". I.e. it promises a certain behavior on mem::forgetting the iterator, and thus (implicitly) encourages people to deliberately leak the iterator to achieve certain behavior. If we wanted to offer an actual documented and guaranteed way of avoiding the draining, then it should be via a (self-consuming) method on the iterator, not by giving any promises on "what happens on mem::forget IMO. (E.g. one could create a fn cancel(self) on the string::Drain iterator that explicitly doesn't remove anything from the String, or vec::Drain could provide a method that puts back the remaining elements into the original Vec.)

All we should do in these kinds of documentation is say, very explicitly, something along the lines of "please don't ever skip dropping these iterators properly, it is considered a logic error and can, among other things, leave the original structure in an arbitrary unspecified (yet safe-to-use in terms of memory-safety) state with potentially surprising and unexpected behavior when you continue using it afterwards, it can result in arbitrary amounts of additional leaks of the to-be-drained items and/or other items in the same original data structure". I.e. strongly discourage actually leaking the iterators, and perhaps provide a non-exhaustive list of things that can happen upon leaking it anyways, without giving any promises that constrain the set of things that can happen any more than necessary by Rust's memory safety guarantees.

Copy link
Member

@steffahn steffahn Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my description.

Do you mean this following sentence?

Apparently people do use mem::forget a lot and are surprised by its effect, see the history of the notice on vec::drain.

If "the history of the notice on vec::drain" is supposed to explain this, can you perhaps provide some links for me to read up on this?

Copy link
Contributor Author

@ssomers ssomers Feb 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my description.

Do you mean this following sentence?

I mean primarily that the description already lists reasons why it would need to be specified as well as reasons not to.

Apparently people do use mem::forget a lot and are surprised by its effect, see the history of the notice on vec::drain.

If "the history of the notice on vec::drain" is supposed to explain this, can you perhaps provide some links for me to read up on this?

I went over #43244 again and don't see it. It's in the existence of #74652 and its predecessors introducing the paragraph. But I probably misinterperted #73844: it's the description that inspired the clarification, not someone actually using mem::forget and expecting to still look at the vector later. Some other posts too I'm sure, but probably less than I thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vec::Drain could provide a method that puts back the remaining elements into the original Vec.)

Intriguing idea, I think no one mentioned that possibility yet.

All we should do in these kinds of documentation is say, very explicitly, something along the lines of

I agree, but apparently no one wanted to discourage users in the Vec::drain description.

@steffahn
Copy link
Member

steffahn commented Feb 8, 2022

The wording here implicitly suggests that there’s a guarantee that e.g. BinaryHeap would still be in a valid state after a Drain iterator is leaked. If elements were to still be remaining, it could arguably, in a future implementation, be decided to leave it in a state where the heap properties are broken (w.r.t. ordering) by leaking such a Drain iterator.

For comparison, I don’t know how e.g. HashMap and BTreeMap currently behave w.r.t. leaking iterators, and whether their (non-memory-safety-related) invariants can be broken this way.

If we were to decide that an invalid state is allowed as an outcome from leaking a BinaryHeap, then the wording here should probably explicitly call that out. E.g. instead of it is unspecified whether the remaining elements are still in the heap or leaked. something like it is unspecified how many of the remaining elements are leaked, and the heap may be left in an invalid state. or maybe some wording in terms of the term “logic error” which is also used in the docs when addressing potential non-conforming Ord or Hash implementations.


This relates to an interesting question I had a while ago and have not found the answer to yet: Does the Rust standard library guarantee that it’s impossible to (safely) create e.g. a BTreeMap that’s in an invalid state, provided the contained types are free of interior mutability and their Ord implementation is correctly following the required axioms?

E.g. unsafe code that gets passed in an arbitrary HashSet<i32> or BTreeSet<i32> might expect to be able to rely on the fact that once a number is known to be or not to be in the set (by checking set.contains(&number)), when it subsequently modifies the set (w.r.t. other, unrelated numbers), the output of set.contains(&number) doesn’t arbitrary change, i.e. the set if well-behaved (which is only ensured when it’s not in an invalid state).

@ssomers
Copy link
Contributor Author

ssomers commented Feb 8, 2022

The wording here implicitly suggests that there’s a guarantee that e.g. BinaryHeap would still be in a valid state after a Drain iterator is leaked.

I thought this valid state is an implicit expectation or an explicit guarantee somewhere for any Rust class. Another one is that using mem::forget, elements are not cloned (even if they are Clone). Therefore all yielded elements must have been removed.

If this is not a guarantee, then you're saying that the existing clause about mem::forget in vec::drain is in fact a guarantee, not a warning about how bad it can be?

@ssomers
Copy link
Contributor Author

ssomers commented Feb 8, 2022

This relates to an interesting question

I don't understand that question. Obviously, by definition, safely creating a collection means it's valid? Even if the Ord implementation stinks, it must still be valid, though behaving weirdly. I'm not so confident about interior mutability but since you exclude that too, what could possible go wrong?

@steffahn
Copy link
Member

steffahn commented Feb 8, 2022

By valid state I mean thing like

  • for a HashMap, the hashes stored inside of the data structure correspond properly to the actual hashes of the contained values w.r.t. their Hash implementation
  • for a BTreeMap, the entries are such that the keys are in-order and without duplicates, w.r.t. the Ord implementation of the key type
  • for BinaryHeap, the values are forming a max-heap w.r.t. the element-type's Ord implementations, i.e. in the (encoded) almost-complete binary tree, each node's value compares larger-than-or-equal-to each each of its children

Of course, these properties can be broken by a "bad" Ord implementation or Hash implementation. But e.g. if for a BTreeSet, the Ord implementation properly, deterministically, and without (in terms of the order) observable interior mutability implements a total preorder, a set that is in a valid state i.e. the keys are in-order without duplicates, you can rely on the result of such things like set.contains(&element). If you never insert an element equal to this &element afterwards, then future calls to set.contains(&element) will keep returning false, and unsafe code might want to depend on BTreeSet being implemented correctly and bug-free. However, for mis-behaving Ord implementation or a map in an invalid state, the answer of a call to set.contains(&element) can change unpredictably from seemingly-unrelated modifications to the set.

My question here would be whether such unsafe code could work with a BTreeSet<i32> from an untrusted (but safe Rust code) source, and soundly rely on this set being in a valid state. This question is, as I noted above, just "an interesting question I had a while ago", nothing all that relevant to answer for this particular PR. The only way in which it is a bit relevant is because I'm suggesting that mem::forgetting a drain-iterator might be declared a "logic error" in the documentation, implying the type might be left in an invalid state; and this might be the first part of API of these data structures that declares a "logic error" that's not due to an ill-defined trait implementation or interior mutability. If only those things could result in logic errors, then BTreeSet<i32> could never get into an invalid state because i32 properly implements Hash/Ord and it doesn't have interior mutability.

@steffahn
Copy link
Member

steffahn commented Feb 8, 2022

I thought this valid state is an implicit expectation or an explicit guarantee somewhere for any Rust class. Another one is that using mem::forget, elements are not cloned (even if they are Clone). Therefore all yielded elements must have been removed.

Let me also individually address parts of your response.

I thought this valid state is an implicit expectation or an explicit guarantee somewhere for any Rust class. Another one is that using mem::forget, elements are not cloned (even if they are Clone). Therefore all yielded elements must have been removed.

Right, I'm not talking about a collection in an invalid state being able to violate any of Rust's safety guarantees. Yieled elements in a generic collection are gone when yielded, naturally. For String this doesn't have to be the case. The collection still is in a safe-to-use state where you cannot cause undefined behavior with its safe API, but it can behave very weird within the bounds of memory-safe behavior.

Obviously, by definition, safely creating a collection means it's valid?

As explained in my previous comment, I mean a different notion of "valid" (related to avoiding what's currently documented as being a "logic error" in some places of the std docs) which is possible to violate in safe code.

Even more, an invalid state under this notion, still usually should not be able to leak memory, AFAIK memory leaks in Rust are only supposed to happen when other things are already leaked (i.e. leaking one thing can result in more leakage) or by creating a reference-cycle in an Arc/Rc or similar structure, or by explicitly calling a function that is meant to leak data like Box::leak or mem::forget or using ManuallyDrop, etc.


Just to be sure this is clear, I'm mostly answering these questions because you asked, but this was just a side-note of an idea I had in my head once and I felt like writing down; it's not really all that relevant to this PR, me or you don't have to answer this "interesting question I had a while ago" of mine in this thread, it was just a side-note that is at most relevant for the question of whether or not to explicitly call leaking a drain-iterator a "logic error" (though any decision on whether or not to call it a "logic error" now shouldn't really prevent changing the documentation in this regard later).

@ssomers
Copy link
Contributor Author

ssomers commented Feb 9, 2022

By valid state I mean thing like

Yes, I agree.

Of course, these properties can be broken by a "bad" Ord implementation

Sure.

My question here would be whether such unsafe code could work with a BTreeSet<i32> from an untrusted (but safe Rust code) source, and soundly rely on this set being in a valid state.

Yes, in the BTreeMap PRs I submitted it was made clear to me that no operation or panic may invalidate the tree. And now I do remember em::forget of the DrainFilter iteratior coming up. So now I realize that your assessment that "The wording here implicitly suggests that there’s a guarantee that e.g. BinaryHeap would still be in a valid state after a Drain iterator is leaked" is spot on: validity is a given.

@ssomers
Copy link
Contributor Author

ssomers commented Feb 9, 2022

Obviously, by definition, safely creating a collection means it's valid?

As explained in my previous comment, I mean a different notion of "valid" (related to avoiding what's currently documented as being a "logic error" in some places of the std docs) which is possible to violate in safe code.

I don't think we have different notions of "valid" in the context of a well-behaved Ord. If Ord violates the rules, I would say a BTrreeMap is still valid, as in the definition of check_invariants, but not to the degree of assert_strictly_ascending simply because you can't speak about order if there isn't any worthy of the name. But we might as well say that an invalid order invalidates the map.

@ssomers ssomers marked this pull request as draft February 9, 2022 11:56
@ssomers
Copy link
Contributor Author

ssomers commented Feb 9, 2022

If elements were to still be remaining, it could arguably, in a future implementation, be decided to leave it in a state where the heap properties are broken (w.r.t. ordering) by leaking such a Drain iterator.

To summarize all above, I remember reading this would never be accepted. But I can't find it back.

@ssomers
Copy link
Contributor Author

ssomers commented Feb 9, 2022

Merged into #93769 #92902

@ssomers ssomers closed this Feb 9, 2022
@steffahn
Copy link
Member

steffahn commented Feb 9, 2022

So now I realize that your assessment that "The wording here implicitly suggests that there’s a guarantee that e.g. BinaryHeap would still be in a valid state after a Drain iterator is leaked" is spot on: validity is a given.

To summarize all above, I remember reading this would never be accepted. But I can't find it back.

Would be really interested to learn more about this, maybe we should ask somewhere else, e.g. on Zulip.


I don't think we have different notions of "valid" in the context of a well-behaved Ord. If Ord violates the rules, I would say a BTrreeMap is still valid, as in the definition of check_invariants, but not to the degree of assert_strictly_ascending simply because you can't speak about order if there isn't any worthy of the name. But we might as well say that an invalid order invalidates the map.

Very sensible approach. In the discussion so far, I really only cared about "validity" in the context of a well-behaved Ord. In my mind Ord being well-behaved is a precondition to even sensibly distinguish "valid" from "invalid (yet still memory-safe to use)" maps.

If there really is a guarantee that you can, for well-behaved Ord types, not safely construct an "invalid" map, then I’m fine with extending the definition of "valid" map to non-well-behaved-Ord types, in this case with the meaning you proposed, i.e. .assert_strictly_ascending does not need to be true anymore. This would be useful because then the guarantee would be "safely constructed maps are always valid". Creation of e.g. a BTreeSet<i32> that violates .assert_strictly_ascending could even be considered library-UB.

On the other hand, I’m not sure what how much really is gained from making creation of an invalid map unsafe/unsound. If construction of BTreeMap/BTreeSet that violate assert_strictly_ascending despite a well-behaved Ord would be regarded merely a "logic error", yet not "unsound", then we could offer safe and efficient methods like something like a BTreeSet::from_sorted_iter_unchecked function which skips the need to do any comparisons when constructing a BTreeSet from a known-to-be-already-sorted iterator. Using such a method with an unsorted iterator would be strongly discouraged, and considered a logic error, but ultimately only about as bad as abusing mem::forget or similar operations.

@steffahn
Copy link
Member

steffahn commented Feb 9, 2022

Merged into #93769

you mean #92902?

@ssomers
Copy link
Contributor Author

ssomers commented Feb 9, 2022

you mean #92902?

Indeed. Just trying to get github stuck in a loop…

@steffahn
Copy link
Member

steffahn commented Feb 9, 2022

Would be really interested to learn more about this, maybe we should ask somewhere else, e.g. on Zulip.

I’ve opened this thread: https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Validity.20guarantees.20of.20standard.20library.20data.20types.3F (or: read-only archive link)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants